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Abstract. An attractive mechanism to specify global constraints in rostering 
and other domains is via formal languages. For instance, the REGULAR and 
Grammar constraints specify constraints in terms of the languages accepted 
by an automaton and a context-free grammar respectively. Taking advantage of 
the fixed length of the constraint, we give an algorithm to transform a context-free 
grammar into an automaton. We then study the use of minimization techniques 
to reduce the size of such automata and speed up propagation. We show that 
minimizing such automata after they have been unfolded and domains initially 
reduced can give automata that are more compact than minimizing before un- 
folding and reducing. Experimental results show that such transformations can 
improve the size of rostering problems that we can "model and run". 



1 Introduction 

Constraint programming provides a wide range of tools for modelling and efficiently 
solving real world problems. However, modelling remains a challenge even for experts. 
Some recent attempts to simplify the modelUng process have focused on specifying con- 
straints using formal language theory. For example the REGULAR [1] and GRAMMAR 
constraints [2, 3] permit constraints to be expressed in terms of automata and gram- 
mars. In this paper, we make two contributions. First, we investigate the relationship 
between REGULAR and GRAMMAR. In particular, we show that it is often beneficial to 
reformulate a GRAMMAR constraint as a REGULAR constraint. Second, we explore the 
effect of minimizing the automaton specifying a REGULAR constraint. We prove that 
by minimizing this automaton after unfolding and initial constraint propagation, we can 
get an exponentially smaller and thus more efficient representation. We show that these 
transformations can improve runtimes by over an order of magnitude. 



2 Background 

A constraint satisfaction problem consists of a set of variables, each with a domain of 
values, and a set of constraints specifying allowed combinations of values for given 
subsets of variables. A solution is an assignment to the variables satisfying the con- 
straints. A constraint is domain consistent iff for each variable, every value in its do- 
main can be extended to an assignment that satisfies the constraint. We will consider 
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constraints specified by automata and grammars. An automaton A = {S, Q, go, F, 6) 
consists of an alphabet E, a set of states Q, an initial state go. a set of accepting states 
F, and a transition relation 6 defining the possible next states given a starting state 
and symbol. The automaton is deterministic (DFA) is there is only one possible next 
state, non-deterministic (NFA) otherwise. A string s is recognized by A iff starting 
from the state go we can reach one of the accepting states using the transition rela- 
tion S. Both DFAs and NFAs recognize precisely regular languages. The constraint 
Regular(^, [Xi, . . . , Xn\) is satisfied iff Xi to X„ is a string accepted by A [1]. 
Pesant has given a domain consistency propagator for Regular based on unfolding 
the DFA to give a n-layer automaton which only accepts strings of length n [1]. 

Given an automaton A, we write unfold^ {A) for the unfolded and layered form of 
A that just accepts words of length n which are in the regular language, min(.A) for 
the canonical form of A with minimal number of states, simplify (A) for the simplified 
form of A constructed by deleting transitions and states that are no longer reachable 
after domains have been reduced. We write fA{n) ^ 9A{n) iff fAiiT-) ^ 9A{n) for all 
n, and there exist A such that log = f^{n). That is, (?^(n) is never smaller than 

fA{n) and there are cases where it is exponentially larger. 

A context-free grammar is a tuple G = (T, H, P, S), where T is a set of terminal 
symbols called the alphabet of G, if is a set of non-terminal symbols, P is a set of 
productions and 5 is a unique starting symbol. A production is a rule A ^ a where 
A is a non-terminal and a is a sequence of terminals and non-terminals. A string in 
E* is generated by G if we start with the sequence a = (S) and non deterministically 
generate a' by replacing any non-terminal A in a by the right hand side of any pro- 
duction A ^ a until a' contains only terminals. A context free language C{G) is the 
language of strings generated by the context free grammar G. A context free grammar 
is in Chomsky normal form if all productions iire of the form A — > BC where B and 
C are non terminals or A ^ a where a is a terminal. Any context free grammar can 
be converted to one that is in Chomsky normal form with at most a linear increase in 
its size. A grammar Ga is acyclic iff there exists a partial order -< of the non-terminals, 
such that for every production Ai A2A3, Ai -< A2 and Ai -< A3. The constraint 
Grammar([Xi, . . . ,Xn],G) is satisfied iff Xi to X„ is a string accepted by G [2,3]. 

Example 1. As the running example we use the Grammar([Xi, X2, X3], G) con- 
straint with domains D{Xi) = {a}, D{X2) = {a, b}, D{X3) = {b} and the grammar 
G in Chomsky normal form [3] {S AB, A ^ AA \ a,B ^ BB \ b}. 

Since we only accept strings of a fixed length, we can convert any context free 
grammar to a regular grammar. However, this may increase the size of the grammar 
exponentially. Similarly, any NFA can be converted to a DFA, but this may increase the 
size of the automaton exponentially. 

3 Grammar constraint 

We briefly describe the domain consistency propagator for the GRAMMAR constraint 
proposed in [2, 3]. This propagator is based on the CYK parser for context-free gram- 
mars. It constructs a dynamic programing table V where an element A of j] is a 



non-terminal that generates a substring from the domains of variables Xi,. .. , 
that can be extended to a solution of the constraint using the domains of the other vari- 
ables. The table V produced by the propagator for Example 1 is given in Figure 1. 




An alternative view of the dynamic programming table produced by this propagator 
is as an AND/OR graph [4]. This is a layered DAG, with layers alternating between 
AND-NODES or OR-NODES. Each OR-NODE in the AND/OR graph corresponds to 
an entry A €:V[i,j]. Am OR-NODE has a child AND-node for each production A — > 
BC so that A e V[i,j], B £ V[i,k] and C e V[i + kj - k]. The children of 
this AND-NODE are the OR-NODES that correspond to the entries B e V[i, k] and 
C e V[i + k,j — k]. Note that the AND/OR graph constructed in this manner is 
equivalent to the table V [4], so we use them interchangeably in this paper. 



Fig. 2. AND/OR graph. 



Every derivation of a string s G JC{G) can be represented as a tree that is a subgraph 
of the AND/OR graph and therefore can be represented as a trace in V. Since every 
possible derivation can be represented this way, both the table V and the corresponding 
AND/OR graph are a compilation of all solutions of the Grammar constraint. 

4 Reformulation into an automaton 

The time complexity of propagating a Grammar constraint is 0{n^\G\), as opposed 
to 0(n|(5|) for a Regular constraint. Therefore, reformulating a Grammar con- 



straint as a Regular constraint may improve propagation speed if it does not require a 
large transition relation. In addition, we can perform optimizations such as minimizing 
the automaton. In this section, we argue that reformulation is practical in many cases 
(sections 4.1-4.3), and there is a polynomial test to determine the size of the resulting 
NFA (section 4.4). In the worst case, the resulting NFA is exponentially larger then the 
original GRAMMAR constraint as the following example shows. Therefore, performing 
the transformation itself is not a suitable test of the feasibiUty of the approach. 

Example 2. Consider Grammar( [Xi , . . . , X„] , G) where G generates L = {ww^ \w € 
{0, Solutions of GRAMMAR can be compiled into the dynamic programming 

table of size 0{n^), while an equivalent NFA that accepts the same language has ex- 
ponential size. Note that an exponential separation does not inmiediately follow from 
that between regular and context-free grammars, because solutions of the Grammar 
constraint are the strict subset of C{G) which have length n. 

In the rest of this section we describe the reformulation in three steps. First, we 
convert into an acyclic grammar (section 4.1), then into a pushdown automaton (sec- 
tion 4.2), and finally we encode this as a NFA (section 4.3). The first two steps are well 
known in formal language theory but we briefly describe them for clarity. 

4.1 Transformation into an acyclic grammar 

We first construct an acycUc grammar, Ga such that the language C{Ga) coincides with 

solutions of the GRAMMAR constraint. Given the table V produced by the Grammar 
propagator (section 3), we construct an acyclic grammar in the following way. For 
each possible derivation of a nonterminal A, A ^ BC, such that A e V[i,j],B e 
y[«,fc] and C G V\i + k.j — k] we introduce a production Ai,j BjjfCi+k.j-k 
in Ga (lines 11- 17 of algorithm 1). The start symbol of Ga is By construction, 
the obtained grammar Ga is acycUc. Every production in Ga is of the form Ai^j 
Bi^kCi+k,j-k and nonterminals Bi^k, Ci+k.j-k occur in rows below jth row in V . Ex- 
ample 3 shows the grammar Ga obtained by Algorithm 1 on our running example. 

Example 3. The acyclic grammar Ga constructed from our running example. 

Sl,5 ~* ^1,2-83,1 I ^1,1-62,2 ^1,2 ^1,1^2,1 -62,2 ^24^3,1 

A,s-^a^ B.s^h Vie {1,2,3} 

To prove equivalence, we recall that traces of the table V represent all possible 
derivations of Grammar solutions. Therefore, every derivation of a solution can be 

simulated by productions from Ga- For instance, consider the solution (a, a, h) of 
Grammar from Example 1. A possible derivation of this string is <S'|sgy[i_3] — > 
^-BUey[i,2],BGy[34] ■^■^B\A(zv[i,i],Aev[2,i\,B£V[-i,i] aAB\,„ — > aaB\,„ — *■ 
aah\.... We can simulate this derivation using productions in Ga'- 'S'1,3 — > ^1,2-63^ — > 
741,1^2,1-63,1 01^2,1-63,1 0102-63,1 010263. 

Observe that, the acycUc grammar Ga is essentially a labelUng of the AND/OR 
graph, with non-terminals corresponding to OR-NODES and productions corresponding 
to AND-NODES. Thus, we use the notation Ga to refer to both the AND/OR graph and 
the corresponding acychc grammar. 



Algorithm 1 Transfomation to an Acyclic Grammar 



1 : procedure CONSTRUCTACYCLIcGRAMMAR(in : X, G, V; out : Go) 




[> T is the set of terminals in Ga 
[> i/ is the set of nonterminals in Ga 
[> P is the set of productions in Ga 



6: V[i,l] = {A\A^ aZG,ae D{Xi)} 

7: for A e V[i, 1] s.t ^ a e G, a e D{Xi) do 

8: T = Tu{a,} 

9: i/ = /fu 

10: p = Pu ^ oi} 




4.2 Transformation into a puslidown automaton 

Given an acyclic grammar Ga = {T, H, P, <S'i,n) from the previous section, we now 
construct a pushdown automaton Pa((S'i,„) ,T,TUH, S, Qp, Fp), where (Si^n) is the 
initial stack of Pa, T is the alphabet, TUH is the set of stack symbols, 5 is the transition 
function, Qp = -Fp = {gp} is the single initial and accepting state. We use an algo- 
rithm that encodes a context free grammar into a pushdown automaton (PDA) that com- 
putes the leftmost derivation of a string[5]. The stack maintains the sequence of symbols 
that are expanded in this derivation. At every step, the PDA non-deterministically uses 
a production to expand the top symbol of the stack if it is a non-terminal, or consumes 
a symbol of the input string if it matches the terminal at the top of the stack. 

We now describe this reformulation in detail. There exists a single state qp which is 
both the starting and an accepting state. For each non-terminal Aij in Ga we introduce 
the set of transitions S{qp.e.Aij) = {{qp, (3)\yAij ^ /3 G Ga}. For each terminal 
tti G Ga, we introduceatransition5(gp,aj,aj) = {(gp,£)}. The automaton Fa accepts 
on the empty stack. This constructs a pushdown automaton accepting C{Ga). 

Example 4. The pushdown automaton Pa constructed for the running example. 

5{qp, £, 5*1,3) = 5{qp, ^1,2^3,1) 6{qp, e, Si^s) = 6{qp, Ai^iB2,2) 

S{qp,s, ^1,2) =S{qp, Ai,ivl2,i) d{qp, e, ^2,2) = S{qp, 52,1^3,1) 



4.3 Transformation into a NFA 

Finally, we construct an NFA(Z', Q, Qo, Fq, a), denoted Na, using the PDA from the 
last section. States of this NFA encode all possible configurations of the stack of the 
PDA that can appear in parsing a string from Ga- To reflect that a state of the NFA 
represents a stack, we write states as sequences of symbols (a), where a is a possibly 
empty sequence of symbols and a[0] is the top of the stack. For example, the initial 



d{qp,s,Ai^-i) = S{qp,ai) 
6{qp,ai,ai) = S{qp,s) 



S{qp, e, Bi^i) = 5{qp, bi)\fi e {1, 2, 3} 
5{qp,bi,bi) = 5{qp,s)yi G {1,2,3} 



state is {Si^n) corresponding to the initial stack (Si^n) of Pa. Algorithm 2 unfolds the 
PDA in a similar way to unfolding the DFA. Note that the NFA accepts only strings of 
length n and has the initial state Qo = {Si,n) and the single final state Fq = {). 



Algorithm 2 Transformation to NFA 

1: procedure PDA TO NFA(m : Pa, out : Af„) 



2: = {(Si.n)} > Q„ is the set of unprocessed states 

3: Q = > Q is the set of states m JV,, 

4: cr = > CT is the set of transitions in JV^ 

5: Qo = {(Si.n)} > Qo is the initial state in JVa 

6: Fo = { } > Fo is the set of final states in JV,, 

7: while Qu is not empty do 

8: if q = {Aij, a) then 

9; for each transition S{qr, e, Aij) = {qp, 13) e S do 

10: (T = crU{<T({A,,3,Q>,e) = (/3,a)} 

11: if {13, a) <^ Q then 

12: Q„ = Q„U{(/3,a)} 

13: Q = Qu{{Aij,a)} 

14: else if g= (a^, a) then 

15: for each transition (5((/p , ai , ai ) — (qp,€) G <5 do 

16: <T = <J U {(T((ai, q) , Oi) = (a)} 

17: if (a) ^ Q then 

18: Q„ = Q„u{(a)} 

19: Q = QU{{ai,a)} 

20: Q„ = Q„ \ {<?} 

21: Ar„(Z',Q, Qo,Fo,o-) = £ - Cios«re(Ara(i:,Q,Qo,Fo<7)). 



We start from the initial stack (S'l „) and find all distinct stack configurations that 
are reachable from this stack using transitions from Pa- For each reachable stack con- 
figuration we create a state in the NFA and add the corresponding transitions. If the new 
stack configurations are the result of expansion of a production in the original grammar, 
these transitions are £— transitions, otherwise they consume a symbol from the input 
string. Note that if a non-terminal appears on top of the stack and gets replaced, then 
it cannot appear in any future stack configuration due to the acyclicity of Ga- There- 
fore \a\ is bounded by 0(n) and Algorithm 2 terminates. The size of Na is 0(|Ga|") 
in the worst case. The automaton Na that we obtain before line 21 is an acyclic NFA 
with e transitions. It accepts the same language as the PDA Pa since every path be- 
tween the starting and the final state of Na is a trace of the stack configurations of Pa. 
Figure 3(a) shows the automaton Na with e-transitions constructed from the running 
example. After applying the e-closure operation, we obtain a layered NFA that does not 
have e transitions (line 21) (Figure 3(b)). 

4.4 Computing tlie size of tlie NFA 

As the NFA may be exponential in size, we provide a polynomial method of computing 
its size in advance. We can use this to decide if it is practical to transform it in this way. 
Observe first that the transformation of a PDA to an NFA maintains a queue of states 
that correspond to stack configurations. Each state corresponds to an OR-NODE in the 
AND/OR graph and each state of an OR- NODE v is generated from the states of the 



Fig. 3. Na produced by Algorithm 2 




parent OR-NODES of v. This suggests a relationship between paths in the AND/OR 
graph of the CYK algorithm and states in Na- We use this relationship to compute 
a loose upper bound for the number of states in Na in time hnear in the size of the 
AND/OR graph by counting the number of paths in that graph. Alternatively, we com- 
pute the exact number of states in A^o in time quadratic in the size of the AND/OR 
graph. 

Theorem 1. There exists a surjection between paths in Gafrom the root to OR-NODES 
and stack configurations in the PDA Pa- 
Proof. Consider a path p from the root of the AND/OR graph to an OR-NODE la- 
belled with Ai^j. We construct a stack configuration r{p) that corresponds to p. We 
start with the empty stack F = {)- We traverse the path from the root to Aij- For every 
AND-NODE vi G p, with left child vi and right child Vr, if the successor of Vi in p is 
vi, then we push Vr on F, otherwise do nothing. When we reach Aij, we push it on F. 
The final configuration F is unique for p and corresponds to the stack of the PDA after 
having parsed the substring 1 ... i — 1 and having non-deterministically chosen to parse 
the substring i . . .i + j — 1 using a production with Aij on the LHS. 

We now show that all stack configurations can be generated by the procedure 
above. Every stack configuration corresponds to at least one partial left most deriva- 
tion of a string. We say a stack configuration (a) corresponds to a derivation dv = 
{ai, - . - , afe_i, Ak.j, a) if a is the context of the stack after parsing the prefix of the 
string of length k + j. Therefore, it is enough to show that all partial left most deriva- 
tion (we omit the prefix of terminals) can be generated by the procedure above. We 
prove by a contradiction. Suppose that (ai, . . . , a^-i, Bij,/3) is the partial left most 
derivation such that F{p{root, Bij)) ^ (3, wherep(root, is the path from the root 
to the OR-NODE Bi^j and for any partial derivation (oi, . . . , ak-\-,Ak,j,a), such that 
k < i At;,.j e Ga F{p{root, Ak.j)) = a. Consider the production rule that introduces 
the nonterminal Bij to the partial derivation. If the production rule is D ^ C, Bij, 
then the partial derivation is (ai, . . . , a/, D, (3) ^\D^c,Bi^j {ai, . . . ,af,C, Bij,/3). 
The path from the root to the node Bi j is a concatenation of the paths from D to Bi j 
and from the root to D. Therefore, F{p{root, Bij j) is constructed as a concatenation 
of F{p{D, Bij)) and F{p{root, D)). F{p{D, Bij)) is empty because the node Bij 
is the right child of AND-NODE that corresponds to the production D ^ G, Bi j and 
F{p{root, D)) = P because f < i. Therefore, F{p{root, Bij)) = (3. If the production 
rule is £) — > Bij,C, then the partial derivation is (ai, . . . , at-i, D, 7) =>\D^Bij,c 



(ai, . . . ,ai_i,Bij, C,7) = (ai, . . . , a^-i, B^j, /?). Then, r{p{root,D)) = 7, be- 
cause i — 1 < i and r{p{D,Bij)) = (C), because the node Bij is the left 
child of AND-NODE that corresponds to the production D —>■ C,Bij. Therefore, 
r{p{root, Bij)) = (C, 7) = f3. This leads to a contradiction. 



Example 5. An example of the mapping described in the last proof is in Figure 4(a) for 

the grammar of our running example. Consider the OR-NODE Ai i. There are 2 paths 
from 51^3 to 1. One is direct and uses only OR-NODES {Si^^, ^1,1) ^nd the other 
uses OR-NODES (51,3, ^1,2, The 2 paths are mapped to 2 different stack config- 
urations (j4i,i, B2.2) and (^1,1, ^2,1. ^3,1} respectively. We highlight edges that are in- 
cident to AND-NODES on each path and lead to the right children of these AND-nodes. 
There is exactly one such edge for each element of a stack configuration. □ 

Note that theorem 1 only specifies a surjection from paths to stack configurations, 
not a bijection. Indeed, different paths may produce the same configuration F. 

Example 6. Consider the grammar G ^ {S ^ AA,A a\AA\BC,B h\BB,C 
c\CC} and the AND/OR graph of this grammar for a string of length 5. The path 
(S'i_5, ^2,4) -82,2) uses the productions 5i,5 ^1,1^2,4 and ^2,4 -B2,2C'4,2. while 
the path (S'l, 5, ^3,3, Bs^i) uses the productions 6*1,5 ^1,2^3,3 and ^3,3 -63,1(74,2. 
Both paths map to the same stack configuration (C4,2 ) . □ 

By construction, the resulting NFA has one state for each stack configuration of 
the PDA in parsing a string. Since each path corresponds to a stack configuration, the 
number of states of the NFA before applying e-closure is bounded by the number of 
paths from the root to any OR-NODE in the AND/OR graph. This is cheap to compute 
using the following recursive algorithm [6]: 



Therefore, the number of states of the NFA Na is at most PD{v), where v is 
an OR-NODE of Ga (Figure 4). 

We can compute the exact number of paths in Na before e-closure without con- 
structing the NFA by counting paths in the stack graph Gy for each OR-NODE v. The 
stack graph captures the observation that each element of a stack configuration gener- 
ated from a path p is associated with exactly one edge e that is incident on p and leads 
to the right child of an AND-node. Gy contains one path for each sequence of such 
edges, so that if two paths p and p' in Ga are mapped to the same stack configuration, 
they are also mapped to the same path in Gy. Formally, the stack graph of an OR-NODE 
V G V{Ga) is a DAG Gy, such that for every stack configuration F of Pa with k ele- 
ments, there is exactly one path p in Gy of length k and v' is the i*^ vertex of p if and 
only if v' is the i*'* element from the top of 7^. 



□ 




1 If w has no incoming edges 

J2pP^{p) where p is a parent of w 



(1) 



Example 7. Consider the grammar of the running example and the OR-NODE Ai^i in 
the AND/OR graph. The stack graph for this OR-NODE is shown in figure 4(b). 



Fig. 4. Computing the size of Na- (a) AND/OR graph Ga- (b) Stack graph Gai,i 




(a) (b) 



Along the path (Si^sAi^i), only the edge that leads to 82,2 generates a stack element. 
This edge is mapped to the edge , -62,2) in Gai,i • Similarly, the edges that lead to 
^2,1 and -83^1 are mapped to the edges (v4i_i, ^24) and (A24, iJs^i) respectively. □ 

Since Gy is a DAG, we can efficiently count the number of paths in it. We construct 
Gy using algorithm 3. The graph G^ computed in algorithm 3 for an OR- NODE v has 
as many paths as there are unique stack configurations in with v at the top. 



Algorithm 3 Computing the stack DAG Gy of an OR- NODE v 

1: procedure STACKGRAPH((in : Go,, v, out : G„)) 



2: V(G„) = M 

3: label{v) — {v} 

4: Q = {(v, Vp)\vp E parents{v)} > queue of edges 

5; while Q not empty do 
6: {vc, Vp) = pop{Q) 

1'. if is an AND-NODE Vc is left child of Vp then 

8; Vr — childrenr(vp) 

9: v{Gp) = viG^)u {vr-} 

10: E{G„) = E{G„) U {ivi,v,.)\vi 6 label{v^)} 

11: label{vp) = label{vp) U {vr} 

12: else 

13: label(vp) = label(vp) U label(v,.) 

14: Q = Q O i(vp, v'p)\v'p e parents {vp)\ > 



Theorem 2. TTzere exists a bijection between paths in Gy and states in the NFA Na 
which correspond to stacks with v at the top. 

Proof. Let p be a path from the root to v in Ga. First, we show that every path p' in 
G,, corresponds to a stack configuration, by mapping p to p'. Therefore p' corresponds 
to r{p). We then show that p' is unique for r{p). This estabhshes a bijection between 
paths in Gy and stack configurations. 

We traverse the inverse of p, denoted inv{p) and construct p' incrementally. Note 
that every vertex in inv{p) is examined by algorithm 3 in the construction of Gy. 
If inv{p) visits the left child of an AND-node, we append the right child of that 



AND-NODE to p'. This vertex is in Gy by line 7. By the construction of r{jp) in the 
proof of theorem 1, a symbol is placed on the stack if and only if it is the right child 
of an AND-NODE, hence if and only if it appears in p' . Moreover, if a vertex is the i^^ 
vertex in a path, it corresponds to the i*'* element from the top of r{p). We now see 
thatp' is unique for r{p). Two distinct paths of length k cannot map to the same stack 
configuration, because they must differ in at least one position i, therefore they corre- 
spond to stacks with different symbols at position i. Therefore, there exists a bijection 
between paths in G„ and stack configurations with v at the top. □ 

Hence \Q{Ng)\ = i^paths{Gy), where v is an OR-NODE of Go. Computing 
the stack graph Gv of every OR-NODE v takes 0{\Ga\) time, as does counting paths 
in Gv Therefore, computing the number of states in Na takes OdGap) time. We can 
also compute the number of states in the e-closure of Na by observing that if none of 
the OR-NODES that are reachable by paths of length 2 from an OR-NODE v correspond 
to terminals, then any state that corresponds to a stack configuration with v at the top 
will only have outgoing e— transitions and will be removed by the e— closure. Thus, to 
compute the number of states in Na after e— closure, we sum the number of paths in G^, 
for all OR-NODES v such that a terminal OR-node can be reached from w by a path of 
length 2. 

4.5 Transformation into a DFA 

Finally, we convert the NFA into a DFA using the standard subset construction. This is 
optional as Pesant's propagator for the Regular constraints works just as well with 
NFAs as DFAs. Indeed, removing non-determinism may increase the size of the au- 
tomaton and slow down propagation. However, converting into a DFA opens up the 
possibility of further optimizations. In particular, as we describe in the next section, 
there are efficient methods to minimize the size of a DFA. By comparison, minimiza- 
tion of a NFA is PSRA^CE-hard in general [7]. Even when we consider just the acychc 
NFA constructed by unfolding a NFA, minimization remains NP-hard [8]. 

5 Automaton minimization 

The DFA constructed by this or other methods may contain redundant states and transi- 
tions. We can speed up propagation of the Regular constraint by minimizing the size 
of this automaton. Minimization can be either offline (i.e. before we have the problem 
data and have unfolded the automaton) or online (i.e. once we have the problem data 
and have unfolded the automaton). There are several reasons why we might prefer an 
online approach where we unfold before minimizing. First, although minimizing after 
unfolding may be more expensive than minimizing before unfolding, both are cheap 
to perform. Minimizing a DFA takes 0{Q log Q) time using Hopcroft's algorithm and 
0(nQ) time for the unfolded DFA where Q is the number of states [9]. Second, thanks 
to Myhill-Nerode's theorem, minimization does not change the layered nature of the 
unfolded DFA. Third, and perhaps most importantly, minimizing a DFA after unfold- 
ing can give an exponentially smaller automaton than minimizing the DFA and then 
unfolding. To put it another way, unfolding may destroy the minimality of the DFA. 



Theorems. Given any DFA A, \min{unfold^{A))\ < |t/7i/o/(i„(min(^))|. 

Proof: To show I inin(wn/oW„(^))| < |Mn/oW„(min(^))|, we observe that both 
m.m{unfold^{A)) and unfold n{mm{A)) are automata that recognize the same lan- 
guage. By definition, minimization returns the smallest DFA accepting this language. 
Hence m.m{unfold^{A)) cannot be larger than unfold ^{mm{ A)). 

To show unfolding then minimizing can give an exponentially smaller sized DFA, 
consider the following language L. A string of length k belongs to L iff it contains the 
symbol j, j = k mod n, where n is a given constant. The alphabet of the language L 
is {0, . . . , n — 1}. The minimal DFA for this language has i7(n2") states as each state 
needs to record which symbols from to n — 1 have been seen so far, as well as the 
current length of the string mod n. Unfolding this minimal DFA and restricting it to 
strings of length n gives an acyclic DFA with /7(n2") states. Note that all strings are 
of length n and the equation j = n mod n has the single solution j = 0. Therefore, the 
language L consists of the strings of length n that contain the symbol 0. On the other 
hand, if we unfold and then minimize, we get an acyclic DFA with just 2n states. Each 
layer of the DFA has two states which record whether has been seen. □ 

Further, if we make our initial problem domain consistent, domains might be pruned 
which give rise to possible simplifications of the DFA. We show here that we should 
also perform such simpUfication before mininoizing. 

Theorem 4. Given any DFA A \inm{simplify{unfold^{A)))\ <C 

\simplify{rmTi{unfold^{A)))\. 

Proof: Both mm{simplify{unfold^{A))) and simplify{mm{unfold^{A))) are 
DFAs that recognize the same language of strings of length n. By defini- 
tion, minimization must return the smallest DFA accepting this language. Hence 
inm{simplify{unfold^{A))) is no larger than simplify{mm{unfold^{A))). 

To show that minimization after simpUfication may give an exponentially smaller 
sized automaton, consider the language which contains sequences of integers from 1 
to n in which at least one integer is repeated and in which the last two integers are 
different. The alphabet of the language Lis {1, ... ,n}. The minimal unfolded DFA for 
strings of length n from this language has i7(2") states as each state needs to record 
which integers have been seen. Suppose the integer n is removed from the domain of 
each variable. The simplified DFA still has /2(2") states to record which integers 1 to 
n — 1 have been seen. On the other hand, suppose we simplify before we minimize. By 
a pigeonhole argument, we can ignore the constraint that an integer is repeated. Hence 
we just need to ensure that the string is of length n and that the last two integers are 
different. The minimal DFA accepting this language requires just 0{n) states. □ 

6 Empirical results 

We empirically evaluated the results of our method on a set of shift-scheduling bench- 
marks [11, 14] ^. Experiments were run with the Minisat+ solver for pseudo-Boolean 

^ We would like to thank Louis-Martin Rousseau and Claude-Guy Quimper for providing us 
with the benchmark data 



instances and Gecode 2.2.0 for constraint problems, on an Intel Xeon 4 CPU, 2.0 Ghz, 
4G RAM. We use a timeout of 3600 sec in all experiments. The problem is to schedule 
employees to activities subject to various rules, e.g. a full-time employee has one hour 
for lunch. This rules are specified by a context-free grammar augmented with restric- 
tions on productions [4]. A schedule for an employee has n = 96 slots of 15 minutes 
represented by n variables. In each slot, an employee can work on an activity (aj), take 
a break (6), lunch (I) or rest (r). These rules are specified by the following grammar: 

S RPR, fp{i,j) = 13 < j < 24, P ^ WbW, L IL\1, /L(i, j) = j = 4 
S ^ RFR, fF{t,j) = 30 < J < 38, R^ rR\r, W ^ Ai, fw{i,j) = j > 4 
Ai UiAiltti, fA{i,j) = open{i), F PLP 

where functions f{i,j) are predicates that restrict the start and length of any string 
matched by a specific production, and open{i) is a function that returns 1 if the business 
is open at i*'* slot and otherwise. In addition, the business requires a certain number 
of employees working in each activity at given times during the day. We minimize the 
number of slots in which employees work such that the demand is satisfied. 

As shown in [4], this problem can be converted into a pseudo-Boolean (PB) model. 
The Grammar constraint is converted into a SAT formula in conjunctive normal form 
using the AND/OR graph. To model labour demand for a slot we introduce Boolean 
variables b{i, j, ak), equal to 1 if j*'' employee performs activity ak at time slot. For 
each time slot i and activity Uk we post a pseudo-Boolean constraint J2'jLi ak) > 
d{i, (Ik), where m is the number of employees. The objective is modelled using the 
function X]r=i X^jli Ylt=i ^iJ-Hk ■ Additionally, the problem can be formulated as an 
optimization problem in a constraint solver, using a matrix model with one row for 
each employee. We post a Grammar constraint on each row. Among constraints on 
each column for labour demand and Lex constraints between adjacent rows to break 
symmetry. We use the static variable and value ordering used in [4]. 

We compare this with reformulating the GRAMMAR constraint as a REGULAR con- 
straint. Using algorithm 3, we computed the size of an equivalent NFA. Surprisingly, 
this is not too big, so we converted the Grammar constraint to a DFA then mini- 
mized. In order to reduce the blow-up that may occur converting a NFA to a DFA, we 
heuristically minimized the NFA using the following simple observation: two states 
are equivalent if they have identical outgoing transitions. We traverse the NFA from the 
last to the first layer and merge equivalent states and then apply the same procedure to 
the reversed NFA. We repeat until we cannot find a pair of equivalent states. We also 
simplified the original CYK table, taking into account whether the business is open or 
closed at each slot. Theorem 4 suggests such simpUfication can significantly reduce the 
size both of the CYK table and of the resulting automata. In practice we also observe a 
significant reduction in size. The resulting minimized automaton obtained before sim- 
plification is about ten times larger compared to the minimised DFA obtained after 
simplification. Table 1 gives the sizes of representations at each step. We see from this 
that the minimized DFA is always smaller than the original CYK table. Interestingly, 
the subset construction generates the minimum DFA from the NFA, even in the case of 
two activities, and heuristic minimization of the NFA achieves a notable reduction. 

For each instance, we used the resulting DFA in place of the Grammar constraint 
in both the CP model and the PB model using the encoding of the Regular con- 



Table 1. Shift Scheduling Problems. Ga is the acyclic grammar, A'^^ is NFA with e-transitions, Na is NFA without e- 
transitions, min(Ara,) is minimized NFA, A is DFA obtained from m.in{Na), min(^) is minimized A, a is the number 
of activities, # is the benchmark number. 
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straint (DFA or NFA) into CNF [10]. We compare the model that uses the PB encoding 
of the Grammar constraint (GRi) with two models that use the PB encoding of the 
Regular constraint (Regulari, REGULAR2), a CP model that uses the Grammar 
constraint (GRf ^) and a CP model that uses a Regular constraint (Regular^ ^). 
RegulaRi and REGULARf ^ use the DFA, whilst REGULAR2 uses the NFA con- 
structed after simplification by when the business is closed. 

The performance of a SAT solver can be sensitive to the ordering of the clauses 
in the formula. To test robustness of the models, we randomly shuffled each of PB in- 
stances to generate 10 equivalent problems and averaged the results over 1 1 instances. 
Also, the Grammar and Regular constraints were encoded into a PB formula in two 
different ways. The first encoding ensures that unit propagation enforces domain con- 
sistency on the constraint. The second encoding ensures that UP detects disentailment 
of the constraint, but does not always enforce domain consistency. For the Grammar 
constraint we omit the same set of clauses as in [4] to obtain the weaker PB encoding. 
For the REGULAR constraint we omit the set of clauses that performs the backward 
propagation of the Regular constraint. Note that Table 2 shows the median time and 
the number of backtracks to prove optimality over 1 1 instances. For each model we 
show the best median time and the corresponding number of backtracks for the PB 
encoding that achieves domain consistency and for the weaker encoding. 

Table 2 shows the results of our experiments using these 5 models. The model 
REGULAR2 outperforms GRi in all benchmarks, whilst model RegulaRi outperforms 
GRi in most of the benchmarks. The model REGULAR2 also proves optimality in sev- 
eral instances of hard benchmarks. It should be noted that performing simplification 
before minimization is essential. It significantly reduces the size of the encoding and 
speeds up MiniSatH- by factor of 5*. Finally, we note that the PB models consistently 
outperformed the CP models, in agreement with the observations of [4]. Between the 
two CP models, Regularj" is significantly better than Gr^ , finding a better solu- 
tion in many instances and proving optimality in two instances. In addition, although 
we do not show it in the table, Gecode is approximately three orders of magnitude 
faster per branch with the Regular^ ^ model. For instance, in benchmark number 2 
with 1 activity and 4 workers, it explores approximately 80 million branches with the 
REGULARf ^ and 24000 branches with the GRf ^ model within the 1 hour timeout. 



Due to lack of space we do not show these results 



Table 2. Shift Scheduling Problems. Gri is the PB model with Grammar, Regulari is the PB model with 
uiin{simplify{DFA)), REGULAR2 is the PB model with m'm(sz7nplify(NFA)), Gr^ ^ is the CSP model with 
Grammar, Regular^^ is the CSP model with min{ simplify {D FA)). We show time and number of backtracks to 
prove optimality (the median time and the median number of backtracks for the PB encoding over solved shuffled instances), 
number of activities, the niomber of workers and the benchmark number #. 
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7 Other related work 

Beldiceanu etal [12] and Pesant [1] proposed specifying constraints using automata and 
provided filtering algorithms for such specifications. Quimper and Walsh [3] and Sell- 
mann [2] then independently proposed the Grammar constraint. Both gave a mono- 
Uthic propagator based on the CYK parser. Quimper and Walsh [4] proposed a CNF 
decomposition of the GRAMMAR constraint, while Bacchus [10] proposed a CNF de- 
composition of the Regular constraint. Kadioglu and Sellmann [13] improved the 
space efficiency of the propagator for the Grammar constraint by a factor of n. Their 
propagator was evaluated on the same shift scheduling benchmarks as here. However, 
as they only found feasible solutions and did not prove optimality, their results are not 
directly comparable. Cote, Gendron, Quimper and Rousseau proposed a mixed-integer 
programming (MIP) encoding of the Grammar constraint [14], Experiments on the 
same shift scheduling problem used here show that such encodings are competitive. 

There is a body of work on other methods to reduce the size of constraint repre- 
sentations. Closest to this work is Lagerkvist who observed that a Regular constraint 
represented as a multi- value decision diagram (MDD) is no larger than that represented 
by a DFA that is minimized and then unfolded [15]. A MDD is similar to an unfolded 
and then minimized DFA except a MDD can have long edges which skip over layers. We 
extend this observation by proving an exponential separation in size between such rep- 
resentations. As a second example, Katsirelos and Walsh compressed table constraints 
representing allowed or disallowed tuples using decision tree methods [16]. They also 
used a compressed representation for tuples that can provide exponentially savings in 
space. As a third example, Carlsson proposed the Case constraint which can be rep- 



resented by a DAG where each node represents a range of values for a variable, and a 
path from the root to a leaf represents a set of satisfying assignments [17]. 

8 Conclusions 

We have shown how to transform a GRAMMAR constraint into a REGULAR constraint 
specified. In the worst case, the transformation may increase the space required to repre- 
sent the constraint. However, in practice, we observed that such transformation reduces 
the space required to represent the constraint and speeds up propagation. We argued 
that transformation also permits us to compress the representation using standard tech- 
niques for automaton minimization. We proved that minimizing such automata after 
they have been unfolded and domains initially reduced can give automata that are ex- 
ponentially more compact than those obtained by minimizing before unfolding and 
reducing. Experimental results demonstrated that such transformations can improve the 
size of rostering problems that can be solved. 
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